efficient online learning
Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent
We consider a natural model of online preference aggregation, where sets of preferred items R 2, ..., R t items in each R t, k t aiming that at least k t appear high in \pi_t. This is a fundamental problem in preference aggregation with applications to e.g., ordering product or news items in web pages based on user scrolling and click patterns. The widely studied Generalized Min-Sum-Set-Cover (GMSSC) problem serves as a formal model for the setting above. GMSSC is NP-hard and the standard application of no-regret online learning algorithms is computationally inefficient, because they operate in the space of rankings. In this work, we show how to achieve low regret for GMSSC in polynomial-time. We employ dimensionality reduction from rankings to the space of doubly stochastic matrices, where we apply Online Gradient Descent. A key step is to show how subgradients can be computed efficiently, by solving the dual of a configuration LP. Using deterministic and randomized rounding schemes, we map doubly stochastic matrices back to rankings with a small loss in the GMSSC objective.
Efficient Online Learning with Predictive Coding Networks: Exploiting Temporal Correlations
Zadeh-Jousdani, Darius Masoum, Hajizada, Elvin, Hüllermeier, Eyke
Robotic systems operating at the edge require efficient online learning algorithms that can continuously adapt to changing environments while processing streaming sensory data. Traditional backpropagation, while effective, conflicts with biological plausibility principles and may be suboptimal for continuous adaptation scenarios. The Predictive Coding (PC) framework offers a biologically plausible alternative with local, Hebbian-like update rules, making it suitable for neuromorphic hardware implementation. However, PC's main limitation is its computational overhead due to multiple inference iterations during training. We present Predictive Coding Network with Temporal Amortization (PCN-TA), which preserves latent states across temporal frames. By leveraging temporal correlations, PCN-TA significantly reduces computational demands while maintaining learning performance. Our experiments on the COIL-20 robotic perception dataset demonstrate that PCN-TA achieves 10% fewer weight updates compared to backpropagation and requires 50% fewer inference steps than baseline PC networks. These efficiency gains directly translate to reduced computational overhead for moving another step toward edge deployment and real-time adaptation support in resource-constrained robotic systems. The biologically-inspired nature of our approach also makes it a promising candidate for future neuromorphic hardware implementations, enabling efficient online learning at the edge.
- Law > Litigation (0.87)
- Education > Educational Setting > Online (0.82)
Review for NeurIPS paper: Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent
Summary and Contributions: The paper studies a variant of online ranking problem. In the offline settings preferred items belong to different groups, and one need to generate a sequence of items so that qualitatively speaking for each group R_t of items at least k_t elements appear high in the list. More formally they formulate the problem as an online version of the known "Generalized Min-Sum Set Cover" (GMSSC) task: In this problem, given a set U {1,...,n} of n items, in each step 1. The learner selects a permutation \pi_t items in U 2. The adversary selects a request R_t \subset U with demand k_t . The goal is to minimize the multiplicative regret, that is the ratio of the total cost to the cost of selecting a fixed optimal permutation \pi* in all steps.
Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent
We consider a natural model of online preference aggregation, where sets of preferred items R1, R2, ..., Rt, ..., along with a demand for kt items in each Rt, appear online. Without prior knowledge of (Rt, kt), the learner maintains a ranking \pit aiming that at least kt items from Rt appear high in \pi_t. This is a fundamental problem in preference aggregation with applications to e.g., ordering product or news items in web pages based on user scrolling and click patterns. The widely studied Generalized Min-Sum-Set-Cover (GMSSC) problem serves as a formal model for the setting above. GMSSC is NP-hard and the standard application of no-regret online learning algorithms is computationally inefficient, because they operate in the space of rankings.
- Information Technology > Enterprise Applications > Human Resources > Learning Management (0.65)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.45)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.45)
Efficient Online Learning with Offline Datasets for Infinite Horizon MDPs: A Bayesian Approach
Tang, Dengwang, Jain, Rahul, Hao, Botao, Wen, Zheng
In this paper, we study the problem of efficient online reinforcement learning in the infinite horizon setting when there is an offline dataset to start with. We assume that the offline dataset is generated by an expert but with unknown level of competence, i.e., it is not perfect and not necessarily using the optimal policy. We show that if the learning agent models the behavioral policy (parameterized by a competence parameter) used by the expert, it can do substantially better in terms of minimizing cumulative regret, than if it doesn't do that. We establish an upper bound on regret of the exact informed PSRL algorithm that scales as $\tilde{O}(\sqrt{T})$. This requires a novel prior-dependent regret analysis of Bayesian online learning algorithms for the infinite horizon setting. We then propose an approximate Informed RLSVI algorithm that we can interpret as performing imitation learning with the offline dataset, and then performing online learning.
- Information Technology > Enterprise Applications > Human Resources > Learning Management (0.80)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)
Efficient Online Learning via Randomized Rounding
Most online algorithms used in machine learning today are based on variants of mirror descent or follow-the-leader. In this paper, we present an online algorithm based on a completely different approach, which combines random playout'' and randomized rounding of loss subgradients. As an application of our approach, we provide the first computationally efficient online algorithm for collaborative filtering with trace-norm constrained matrices. As a second application, we solve an open question linking batch learning and transductive online learning.
Efficient Online Learning with Memory via Frank-Wolfe Optimization: Algorithms with Bounded Dynamic Regret and Applications to Control
Zhou, Hongyu, Xu, Zirui, Tzoumas, Vasileios
Projection operations are a typical computation bottleneck in online learning. In this paper, we enable projection-free online learning within the framework of Online Convex Optimization with Memory (OCO-M) -- OCO-M captures how the history of decisions affects the current outcome by allowing the online learning loss functions to depend on both current and past decisions. Particularly, we introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret, i.e., that minimizes the suboptimality against any sequence of time-varying decisions. We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments in real-time, accounting for how past decisions affect the present. Examples of such applications are: online control of dynamical systems; statistical arbitrage; and time series prediction. The algorithm builds on the Online Frank-Wolfe (OFW) and Hedge algorithms. We demonstrate how our algorithm can be applied to the online control of linear time-varying systems in the presence of unpredictable process noise. To this end, we develop a controller with memory and bounded dynamic regret against any optimal time-varying linear feedback control policy. We validate our algorithm in simulated scenarios of online control of linear time-invariant systems.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)